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DEVELOPnENT OF A PIAGETIAK'-BA5ED VJRITTEiJ TE^>T' 
A CRITERJ0.\'-REFERE::CED APrROACH 
An atterapt Mas made to develop and validate a Piagetian 
based written test v;ith successful use of the loqic of speci 
fic Piagetian tasks defined as the criterion, 'Jinety-'yix ' 
randomly selected 9 - 16 year olds, stratified hy ac;o, were 
individually presented the Piagetian tasks oi. pendulu^n, bal- 
ance ^ and conbinations and group administered a 36-iter-. log- 
ically equivalent written test. Results indicated that a 
criterion-referenced approach to constructing a Piagetian- 
based v/ritten test of cognitive devclopraent is possible and 
that the average age of change froj^^ concrete to forjnal opera 
tions is consistent vvith previous research. 



DhVELOPfKNT or A ri7vG{..T] AN-CAfji:D IJRITTL'N TEST: 
A CUITERION-ULTl^HIOWCi. n APPROACi:^ 

Traditionally, cis >cssmont: of cognitive constructs har, been 
based on the work of Dinot, v;ith tv;o :<iothodoloc}ic:al approaches: 
individual or group-adininistered tests. These approaches have 
been based on psychoiuctric rigor and convenience-/ v;ith little 
reyard to understanding v;hy a subject performed as he did. 
An individual's assessment and subsequent rating has been de- 
pendent on the mastery of specific information and on h.ir, posi- 
tion relative to a norm group v/ithin the normal curve model of 
probability. Consequently, if an individual did not know that 
the Koran is the Islamic holy book, or that the Apocrypha were 
the disputed books in the Bibl e ^ he did not receive any credit 
toward a rating of his coqnitive prowess for those items. Be- 
cause such tests generally have not been based on a theory of 
psychological development, they have not been adequate in as-- 
sessing the development of specific constructs and, in reality, 
have caused many problems of interpretation, especially within 
the school situation. 

Piaget has used a variation of the individual testin<i situ-- 
ation (iiis inethode cliniaue) aiid has atteir.pted to assess cogni- 
tive constructs v/hicu do not depend upon knowing specific ele- 
ments of knowledge or upon how an individual performs relative 
to a norm group within the normal curve model; rather, his work 
has focused on assessing cognitive constructs that are necessary 



for competent interaction v;ith the world, generally not teach- 
able, and develop in all individuals at different rates. Al- 
though cognitive construct development is continuous, there are 
durations of time (periods) vnthin v/hich the individual's cog-- 
nitive behavior is fairly stable and qua ] it a tive ly d ifferent 
from the behav±>i^ of the other i.^eriods. ^'ithin each period of 
stability, Piaget distinguishes two subperioJs: a beginning 
subperiod, \;hcre the individual begins to manifest the logical 
cognitive characteristics describing the overall period, but 
fails to consistentl y manifest those characteristics and conse- 
quently at times regresses cognitively and manifests character- 
istics of an earlier period; and a second subperiod, where cin 
individual c onsistentl y manifests the logical cognitive char-- 
acteristics of the overall period, generally does not regress 
cognitivelyr and manifests sporadically tlie logical cognitive 
constructs of the first subperiod of the next period (Inholder & 
Pxaget, 1958) . Although the logical cognitive structures of 
each subperiod of an overall period are similar/ they are also 
different, and, as such, enable an individual to solve different 
logical problems at different periods in his life. 

Unfortunately, Piaget 's moth ode cliniaue, like the individual 
method within the binet tradition, is very time consuming and 
difficult to employ. Much information can be obtained about one 
person per unit of time, but very little information can be ob- 
tained about many people in the si.me unit of time. A Piagetian- 
based groups-administered written test of logical cognitive devel- 



opnicnt v;oulci be able to provide much information about many in- 
dividuals per unit of tirae. Such a test v/ould be a criterion- 
referenced one, as it v;ould provide " . , . scores that tell 
what kinds of behavior individuals v/ith those scores can demon- 
strate [Mitko, 1970, p. 38]." Tnc test could be designed with 
several scales, each scr/le ccrrespcnding to the development of 
an overall specific locjiccl coc-nitive behavior, while subccales 
v/ithin a scale would cotrospond to the developmental logical 
behaviors associated with specific periods and subperiods. 

The present study was an attempt to construct a group- 
administered written test that v;ould assess the same develop- 
mental logical constructs as those assessed by specific Piaget- 
ian individualay-^administered tasks by coniparing response "pat- 
terns" on the v;ritten test v;ith response "patterns" on the Pia-- 
getian tasks. 

Method of Investigation 
Construction q£ Written Instrument 

Three scales^ each corresponding to a specific set of de- 
velopmental logical cognitive behaviors, comprised the test. 
Each scale waif: constructed with tl-e suggestions implied by 
Glaser and Klaus (1962) ctA Glaser and Cox (1962) for criterion- 
referenced measures and the recor;.mended specifications of Nitko 
(1970) used guidelines: (a) "The claf^ses of behaviors that 
define different achievenent levels are specified as clearly 
as is possible before the test is constructed [p. 38]." Behavio 
defining the different logical development'' I behavior levels 



within each scale v;ore worked out according to data provided by 
Tnhelder and Piagct (1350), v^here each scale corresponded to a 
specific Piagetian task-. exclusion -• peuviulun: proportion - bal- 
ance; combination colored and cclorJess liquids. According to 
Inhelder and Piagtt, children ruinifcs':. .v.ifferent logical behaviors 
on each task, depending upon the neried of cognitive development 
that they are in. For each scale,- each subscale corresponded to 
the cognitive behaviors characteristic of a different suhpex^iod 
for the developmental logic of thac scale « [See Gray (1970) for 
a suiroTtary of the three scales,. Piagetian (sub) periods, and de- 
velopniental logic used on the test.] (b) • . each behavior 
class is defined by a set of test situations (\h^it is, test items 
or test tasks) in which the behaviors can be displayed in terroi-; 
of all their important nuances [p,3C]. * For each logical scale, 
each developmental level (subsc-ile) v;ar> defined by all of the v;rit- 
ten iteir.s that had the same logical structure as those logical 
behaviors characteristic of the specific Piagetian subperiods for 
the corresponding Piagetian tasks, Althourh the logical struc- 
ture, of the itens v'as the saK.O; the content v/as different. (c) 

. . given that tlie clasce.o of ocho^vior nave been specified 
and that the test situaticn^j have been defined; a representative 
sampling plan is designed and usee, to select the test taskc that 
v;ill appear on any foru of the test [p* 38]. * A total of thirty-- 
six items v;ere selected and adapted from those used in a pilot 
study. Each item had five alternatives.. v;ith the fifth alterna- 
tive (e) alv;ays being 'Llone of the above ansv/ers is correct*'* 



Distribution of correct alternatives v;erc as follov/S! A, D = 
8 each, C = 7, E = 5* Items were randomly assigned their item 
niimbGr* Twelve items corresponded to the developmental logic of 
the pendulum, 12 items corresponded to the developmental logic 
of the balance, and 12 items corresponded to the developmental 
logic of combinations of colored and colorless liquids* Each 
scale of 12 items v/as divided into 6 items reflecting the logic 
of concrete operations (3 items for beginning concrete - concrete 
I; 3 items for concrete - concrete II) emd 6 items reflecting the 
logic of formal operations (3 items for beginning formal - 
formal I; 3 items for formal - formal II). (d) . . the ob- - 
tained score must be capable of expressing objectively and mean- 
ingfully the individual's performance characteristics in these 
classes of behavior [p. 38]." For each scale, subjects were 
given scores based on their patterns of correct cmd incorrect 
responses. For example, a subject classified as concrete II 
on the logic of combinations could use the logic of onc-to-*many 
and one--to-^ne logical multiplication and generally could not 
use the logic of combinations or permutation. 

The general test directions and each item were controlled • 
for reading difficulty by applying the Dalc-Chall Readability 
Formula, with all but three of the items rated as fourth grade 
or lower. The remaining three items were rated at fifth - sixth 
grade difficulty. 
Sample 

Subjects were stratified by ago by rounding ttieir ages off 
to the nearest whole age. Within each ago leveXj^ a rcindom sam- 



pie of twelve subjects per age was selected, for a total of 96 
subjects from 9-16 years of ago^ No student who was known to 
have reading problems v;as included. 
Procedures 

For each age level, one-half of the subjects were given the 
following Piagetian tasks: (a) Oscillation of a Pendulum, 
(b) Equilibrium in the Balance, and (c) Combinations of Colored 
and Color less Chemical Bodies, first; and the written test, 
second. The remaining subjects were given the written test first 
and the Piagetian tasks second. Administration of the Piagetian 
tasks followed the guidelines "suggested"' by Inhelder and Piaget 
(1958). All verbalizations v/ei'e audio recorded, and the experi- 
menter rated each subject's competence on each task on a behav- 
ioral rating sheet designed in accordance with the developmental 
level characteristics of subjects working with the three prob- 
lems (Inhelder & Piaget, 1958). After one-half of the subjects 
in each age group had been tested v;ith the Piagetian tasks, the 
written test was given to all subjects in a large group situation. 

On each Piagetian task and each scale, subjects were classi- 
fied as preoperational, concrete I, concrete II, formal I, or 
formal II. Classification criteria for the Piagetian tasks 
were those used by Inhelder and Piagot (1958). Classification 
criteria for each written scale v;ere adapted from Longeot's 
(n. d.) and based on subscale-scale response patterns 5 
preoperational - less than two correct i^esponses for each sub- 
scale j concrete I - at least tv;o correct responses on the 
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concrotij I items and less than tv/o correct responses on each of 
the other subscales; concrete II -- at least two correct responses 
on each set of concrete items and less taan two correct responses 
on each set of formal items; formal I - at least two correct re- 
sponses on each set of concrete items and the formal I items, 
and less than tv;o correct responses on th^ formal II items; 
formal II - at least t^io correct responses on each set of items. 

The criteria were not met by 13,5% (3S/288) of the patterns. 
Of the 39 non~ideal patterns, 18 were easily classifiable, leav- 
ing only 7.29% (21/288) response patterns which did not meet 
the classification criteria and wc>re considered to be difficult 
patterns to classify. 

Results 

For each type of logic ^ there v;as no siynif leant transfer 
frora one type of test (Piagetian/ v;ritten) to the other. Sub- 
jects who took the Piagetian tasks first and the written test 
second did no better than subjects taking the v/ritten test 
first and the Piagetian tasks second (Exclusion - Pendulum, 
t = 1.02; Proportion - Balance, t = -.13; Combination - Chem- 
icals, t = .81, df = 94) . 
Convergent an d Discriminant Validity 

A multitrait--niultimethcd matrix (Campbell & Fiske, 1S59) 
for the intercorrelations among the throe Piagetian tasks and 
the three scales on the v;ritten test appears in Table 1. All 



Insert Table 1 About Here 
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off -diagonal entiics are significant (p < .005, df = 94, one- 
tail). Estimates of KR^^ reliabilities are moderate, except 
for the written combinations. The validity values arc of mod- 
erate size and greater than those considered substantial by 
Campbell & Fiske (1959), consequently Campbell and Fiske's cri- 
terion for convergent validity v;as met. 

Evidence of the uniqueness • of each set of logical behav- 
iors from the others is less clear. All of the validity values 
meet Campbell and Fiske's first two criteria for discriminant 
validity— validity values should be greater than the respective 
row-column entries in the heteromethod triangles and the mono- 
method triangles— but not for all compcirisons (See Table 2.). 



Insert Table 2 About Here 



■ni of the comparisons that did not meet the criteria involved 
a measure of the logic of exclusion; and the differences betv7een 
the entries was small, the greatest having an absolute value of 
.022 (proportion validity vs. proportion-exclusion value in 
written monowethod triangle) . The pattern of intcrcorrolations 
within the respective triangles also is not clear, as the pat- 
terns in the heteromethod triangles -re different from each other 
and also different from the patterns in the monomethod triangles, 
v/hich are the same. 

For each set of developmental logic, there is definite evi- 
dence of convergent validity, but little evidence of discrimin- 



ant v^ilidity, ovon though Cairpbcll and Fiske' (1959) statu that 
the second of their discriminant validity criteria- — validity 
values should bo greater thc^i respective row-column entries in 
the nononiethod triangles — is an ideal criterion and not gonorally 
met. 

Written Test 

Table 3 presents mean iteKi rankings for the three scales. 
For each scale, Pearson r's b-etv/een the moan predicted rank 



Insert Table 3 About Here 



for each item and the mean empirical rank for each item wore 
computed. All threu correlations are significant; taut only in 
the combination scale is there no interchanging of items from 
the different subscales* The two items from adjacent subscales 
v;ith a difficulty of 16 are the only possible exceptions. Note 
that 8 of the combination itons are extremely difficult, v/hereas 
only one item from th*^ other scales is as difficult. The ''cellar 
effect*' definitely restricted the rang-^ of scores for the writ- 
ten corabination scale, resulting in the raedium 1q\%^ correlations 
and reliability for the 'jcale (S^e Table IJ, In effect, the 
correlations involving the written combination scale were arti-- 
ficially depresst^d and, in reality, are probable much higher. 
Age and Sox 

For each of the written scales and uach Piagetian task, a 

one— v/ay AUOVA v;ith unequal cell frequencies was run across ages. 
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with an ANOVA Purforn^d for ^ach St-x and the total s.-^r-iple. 
SchoffC's technique v/as then applied to the data of thos<. ANOVA' s 
that v;ore significant '-nt .05. The ages bv:;twcen v/hich the gi-o.?t - 
est increase in r-Vjan classifications occurred v;as det^^rrainod <. 
Age levels below the incrocis^-' v?ere cons"i''* ^» s one compari- 
son group, v/hile age levels above the iu»^. .^ase were considered 
as the second comparison grout>. Comparison groups v;ere chosen 
v;ith the assumption that the agv.3 betw-icn which the greatest in- 
crease in mean classification occurred reflected the ages at 
which the m'ijority oE subjects made the transition from concrete 
operations to fr^'^nal operations. Consequently- the ScheffC^ com- 
parisons wera ^ to be betv-'oen concrete operational and 
formal operatic*i^^ subjects ^ Table 4 sumpiarises th«--Su results. 



Insert Table 4 About Here 



In all cases vrhere the original ANOVA was significant, the Scheff 
comparison v;as significant at l^nst at the 1«^vg1 (p < «10) sug- 
gested by Scheffe (Ferguson* 1571) and. in nest caseS; at a lover 
level of probability* Th^ c^r,s^s at which the greatt^st incrc^^ise 
in scores occurs is gv.nerally cliiffer^nt- depending on th^ devol-* 
opmental logic measured, the iv-,thod of assussrent, and the sex 
of thu subjects, although the gr^jntest number of 'jumps*' in mean 
scores occurred between tv;elvc and fourteen years of age. 

Discussion 

The correlations between the two m«jthods m.easuring the same 
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sot of dcvolopmontal logic (vo.lidity values) along with nodurato 
reliabilities are encouraging in that they are sufficiently large 

sup'-ort the conclusion that a writtt^n test using the develop- 
!..^nL...i logic postulated by Pinget :is its behavioral criterion 
is definitely possible , although there is room for improvement 
in this particular attempt. AIdo, the evidence of convergent 
validity is supportive of the generalization of Piagetian theory 
to "non-Piagetian" tasks. This lends credence to Piaget's be- 
lief that his conception of developmental levels is evidenced 
in Piagetian tasks and other tasks (Inhelder & Piaget, 1958). 
Psychometrically, the lack of discriminant validity of the devel- 
opmental logics is disappointing and would indicate a definite 
effect of method variance (Gee Table 1.)? ytt this same lack of 
clearcut discrimination between the different sets of develop- 
mental logic provides support for Piaget-s contention that a set 

of developmental logic is only one manifestation of an individ- 

and 

ual's general reasoning level; "generally when one set of logic 
has developed, other logics characteristic of that period should 
also have developed [See logic in Inhelder and Piaget (1958) and 
Gray (1970) J . 

The correspondence between the predicted and empirical 
written item sequences is excellent, indicating that Piaget 's 
hypothesis of the developmental sequence of logical behaviors 
can be measured using Piagutian-based logical written problems. 
An exception is the exclusion concrete II items, on the average, 
being easier than the concrete I items. Both sets of items were 
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serializations, with tho ccncri,to I itcns copiposed of three en- 
tities fcr comparison and the concrete II iteras composed of four 
entities for comparison. The three it-ms using the logical com- 
parison '-groator than' irrespective of number of entities to be 
compared were all easier than the three items using the logical 
comparison "less than=-;but a 3cheffe ccnparison between the mean 
difficulties of the two sets was not significant (iB/d =1.49, 
p < .10). 

The extremely low dif f icultiuiS anJ validity value for the 
logic of combinations would seem to indicate that the recognition 
oriented multiple-choice format is not sensitive enougli to mea- 
sure the combinatorial ability of subjects, Ra-zher, \t appears, 
basGd on work by Longeot (1962, 19 S4, do) and current work 
of the author, that an open-ended typu of question, v/herc the 
subject is required to generate the combinations, is much more 
sensitive in measuring combinatorial ability. The open-ended typ 
question is certainly more ''in the spirit'^ of Piaget, v;here the 
subject generally has to generate his own answers and not select 
the ''best one*' from a predeterrained list. 

Evidence of the subjects^ possible past experience v/ith 
v;ritten proportion types of questions can be seen in the propor- 
tion ^^cutoff^= ages in Table 4. Tht. ^'cutoff'' ago for the written 
proportions across se:: and tc:tal sample is consistently a mini-- 
mum of two years younger compared to the '^cutoff^' ages for its 
logical counterpart — the balance — and any other comparison with 
the exception of males and total saraplc for the pendulum. This 
^wculd indicate that the writtu.n proportions may be tapping past 
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specific learnings as well as tho logical operations of propor- 
tions, although such staggering of the "cutoff" ages is not un- 
common (See Lovell, 1971; Piaget, 1971.) or unreasonable, and 
tho "cutoffs^' reported are generally consistent v/ith previously 
reported results for similar type itcr.s (Winch, 1922a, 1922b; 
Burt, 1919; Longeot, 1962, 1964), 

It appears that a Piagetian-based written test of logical 
cognitive development is possible if it is constructed according 
to bohaviorally-oriented guidelines for criterion-referenced 
measurement. Certainly such a test is desireable, considering 
the traditional problems of evaluating cognitive skills and 
the problems associated with adequate measurement of skills in 
such individualized instruction programs as IGE. If such a test 
can be refined, a series of develcpmontally-based criterion- 
referenced tests which would demand the same cognitive skills, 
but for different content areas, could be constructed. Such a 
series of tests would have an advantage over current tests of 
being able to more accurately determine the reasoning level of 
a student within a specific content domain, and, hopefully, 
facilitate instruction and learning. At worst, it would be a 
device based on the actual cognitive development of children 
rather than something that is ner.jly statistically convenient. 
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